AITopics | machine translation system

Collaborating Authors

machine translation system

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Beyond Ranked Lists: The SARAL Framework for Cross-Lingual Document Set Retrieval

Agarwal, Shantanu, Barry, Joel, Boschee, Elizabeth, Miller, Scott

arXiv.org Artificial IntelligenceNov-6-2025

Machine Translation for English Retrieval of Information in Any Language (MATERIAL) is an IARPA initiative targeted to advance the state of cross-lingual information retrieval (CLIR). This report provides a detailed description of Information Sciences Institute's (ISI's) Summarization and domain-Adaptive Retrieval Across Language's (SARAL's) effort for MATERIAL. Specifically, we outline our team's novel approach to handle CLIR with emphasis in developing an approach amenable to retrieve a query-relevant document \textit{set}, and not just a ranked document-list. In MATERIAL's Phase-3 evaluations, SARAL exceeded the performance of other teams in five out of six evaluation conditions spanning three different languages (Farsi, Kazakh, and Georgian).

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2511.03228

Country:

North America > United States > Ohio (0.14)
North America > United States > California (0.14)

Genre: Research Report (0.84)

Industry:

Health & Medicine > Therapeutic Area > Immunology (0.69)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.92)

Add feedback

Trainable Reference-Based Evaluation Metric for Identifying Quality of English-Gujarati Machine Translation System

Joshi, Nisheeth, Katyayan, Pragya, Arora, Palak

arXiv.org Artificial IntelligenceOct-8-2025

Machine Translation (MT) Evaluation is an integral part of the MT development life cycle. Without analyzing the outputs of MT engines, it is impossible to evaluate the performance of an MT system. Through experiments, it has been identified that what works for English and other European languages does not work well with Indian languages. Thus, In this paper, we have introduced a reference-based MT evaluation metric for Gujarati which is based on supervised learning. We have trained two versions of the metric which uses 25 features for training. Among the two models, one model is trained using 6 hidden layers with 500 epochs while the other model is trained using 10 hidden layers with 500 epochs. To test the performance of the metric, we collected 1000 MT outputs of seven MT systems. These MT engine outputs were compared with 1 human reference translation. While comparing the developed metrics with other available metrics, it was found that the metrics produced better human correlations.

machine learning, natural language, translation, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1063/5.0248263

2510.05113

Country: Asia > India (0.96)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

Mitigating Stylistic Biases of Machine Translation Systems via Monolingual Corpora Only

Gao, Xuanqi, Jiang, Weipeng, Zhai, Juan, Ma, Shiqing, Xie, Siyi, Yin, Xinyang, Shen, Chao

arXiv.org Artificial IntelligenceJul-21-2025

The advent of neural machine translation (NMT) has revolutionized cross-lingual communication, yet preserving stylistic nuances remains a significant challenge. While existing approaches often require parallel corpora for style preservation, we introduce Babel, a novel framework that enhances stylistic fidelity in NMT using only monolingual corpora. Babel employs two key components: (1) a style detector based on contextual embeddings that identifies stylistic disparities between source and target texts, and (2) a diffusion-based style applicator that rectifies stylistic inconsistencies while maintaining semantic integrity. Our framework integrates with existing NMT systems as a post-processing module, enabling style-aware translation without requiring architectural modifications or parallel stylistic data. Extensive experiments on five diverse domains (law, literature, scientific writing, medicine, and educational content) demonstrate Babel's effectiveness: it identifies stylistic inconsistencies with 88.21% precision and improves stylistic preservation by 150% while maintaining a high semantic similarity score of 0.92. Human evaluation confirms that translations refined by Babel better preserve source text style while maintaining fluency and adequacy.

machine learning, natural language, translation, (18 more...)

arXiv.org Artificial Intelligence

2507.13395

Country: Asia > China (0.28)

Genre:

Research Report > New Finding (0.67)
Research Report > Experimental Study (0.46)

Industry:

Information Technology > Security & Privacy (0.93)
Education (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

The first open machine translation system for the Chechen language

Umishov, Abu-Viskhan A., Grigorian, Vladislav A.

arXiv.org Artificial IntelligenceJul-18-2025

We introduce the first open-source model for translation between the vulnerable Chechen language and Russian, and the dataset collected to train and evaluate it. We explore fine-tuning capabilities for including a new language into a large language model system for multilingual translation NLLB-200. The BLEU / ChrF++ scores for our model are 8.34 / 34.69 and 20.89 / 44.55 for translation from Russian to Chechen and reverse direction, respectively. The release of the translation models is accompanied by the distribution of parallel words, phrases and sentences corpora and multilingual sentence encoder adapted to the Chechen language.

artificial intelligence, natural language, translation, (15 more...)

arXiv.org Artificial Intelligence

2507.12672

Country: Europe > Russia > North Caucasian Federal District (0.46)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

Natural language processing for African languages

Adelani, David Ifeoluwa

arXiv.org Artificial IntelligenceJul-2-2025

Recent advances in word embeddings and language models use large-scale, unlabelled data and self-supervised learning to boost NLP performance. Multilingual models, often trained on web-sourced data like Wikipedia, face challenges: few low-resource languages are included, their data is often noisy, and lack of labeled datasets makes it hard to evaluate performance outside high-resource languages like English. In this dissertation, we focus on languages spoken in Sub-Saharan Africa where all the indigenous languages in this region can be regarded as low-resourced in terms of the availability of labelled data for NLP tasks and unlabelled data found on the web. We analyse the noise in the publicly available corpora, and curate a high-quality corpus, demonstrating that the quality of semantic representations learned in word embeddings does not only depend on the amount of data but on the quality of pre-training data. We demonstrate empirically the limitations of word embeddings, and the opportunities the multilingual pre-trained language model (PLM) offers especially for languages unseen during pre-training and low-resource scenarios. We further study how to adapt and specialize multilingual PLMs to unseen African languages using a small amount of monolingual texts. To address the under-representation of the African languages in NLP research, we developed large scale human-annotated labelled datasets for 21 African languages in two impactful NLP tasks: named entity recognition and machine translation. We conduct an extensive empirical evaluation using state-of-the-art methods across supervised, weakly-supervised, and transfer learning settings.

artificial intelligence, large language model, machine learning, (23 more...)

arXiv.org Artificial Intelligence

2507.00297

Country:

Africa > Middle East (1.00)
Africa > Nigeria (0.93)
Asia > Middle East (0.92)
(4 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Media > News (1.00)
Information Technology (1.00)
Government > Regional Government (1.00)
(6 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(4 more...)

Add feedback

Low-resource Machine Translation for Code-switched Kazakh-Russian Language Pair

Borisov, Maksim, Kozhirbayev, Zhanibek, Malykh, Valentin

arXiv.org Artificial IntelligenceMar-25-2025

Machine translation for low resource language pairs is a challenging task. This task could become extremely difficult once a speaker uses code switching. We propose a method to build a machine translation model for code-switched Kazakh-Russian language pair with no labeled data. Our method is basing on generation of synthetic data. Additionally, we present the first codeswitching Kazakh-Russian parallel corpus and the evaluation results, which include a model achieving 16.48 BLEU almost reaching an existing commercial system and beating it by human evaluation.

computational linguistic, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2503.20007

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Oceania > Australia > Victoria > Melbourne (0.04)
Europe > Russia > Northwestern Federal District > Leningrad Oblast > Saint Petersburg (0.04)
(16 more...)

Genre: Research Report > New Finding (0.46)

Industry: Information Technology (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Dialectal and Low-Resource Machine Translation for Aromanian

Jerpelea, Alexandru-Iulius, Rădoi, Alina, Nisioi, Sergiu

arXiv.org Artificial IntelligenceJan-7-2025

This paper presents the process of building a neural machine translation system with support for English, Romanian, and Aromanian - an endangered Eastern Romance language. The primary contribution of this research is twofold: (1) the creation of the most extensive Aromanian-Romanian parallel corpus to date, consisting of 79,000 sentence pairs, and (2) the development and comparative analysis of several machine translation models optimized for Aromanian. To accomplish this, we introduce a suite of auxiliary tools, including a language-agnostic sentence embedding model for text mining and automated evaluation, complemented by a diacritics conversion system for different writing standards. This research brings contributions to both computational linguistics and language preservation efforts by establishing essential resources for a historically under-resourced language. All datasets, trained models, and associated tools are public: https://huggingface.co/aronlp and https://arotranslate.com

aromanian, computational linguistic, translation, (14 more...)

arXiv.org Artificial Intelligence

2410.17728

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Florida > Miami-Dade County > Miami (0.04)
Europe > Romania > București - Ilfov Development Region > Municipality of Bucharest > Bucharest (0.04)
(12 more...)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

Task-Oriented Dialog Systems for the Senegalese Wolof Language

Mbaye, Derguene, Diallo, Moussa

arXiv.org Artificial IntelligenceDec-15-2024

In recent years, we are seeing considerable interest in conversational agents with the rise of large language models (LLMs). Although they offer considerable advantages, LLMs also present significant risks, such as hallucination, which hinder their widespread deployment in industry. Moreover, low-resource languages such as African ones are still underrepresented in these systems limiting their performance in these languages. In this paper, we illustrate a more classical approach based on modular architectures of Task-oriented Dialog Systems (ToDS) offering better control over outputs. We propose a chatbot generation engine based on the Rasa framework and a robust methodology for projecting annotations onto the Wolof language using an in-house machine translation system. After evaluating a generated chatbot trained on the Amazon Massive dataset, our Wolof Intent Classifier performs similarly to the one obtained for French, which is a resource-rich language. We also show that this approach is extensible to other low-resource languages, thanks to the intent classifier's language-agnostic pipeline, simplifying the design of chatbots in these languages.

large language model, natural language, translation, (19 more...)

arXiv.org Artificial Intelligence

2412.11203

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > Canada > Ontario > Toronto (0.04)
Europe > Middle East > Cyprus > Nicosia > Nicosia (0.04)
(14 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)

Add feedback

Prompting ChatGPT for Translation: A Comparative Analysis of Translation Brief and Persona Prompts

He, Sui

arXiv.org Artificial IntelligenceApr-28-2024

Prompt engineering has shown potential for improving translation quality in LLMs. However, the possibility of using translation concepts in prompt design remains largely underexplored. Against this backdrop, the current paper discusses the effectiveness of incorporating the conceptual tool of translation brief and the personas of translator and author into prompt design for translation tasks in ChatGPT. Findings suggest that, although certain elements are constructive in facilitating human-to-human communication for translation tasks, their effectiveness is limited for improving translation quality in ChatGPT. This accentuates the need for explorative research on how translation theorists and practitioners can develop the current set of conceptual tools rooted in the human-to-human communication paradigm for translation purposes in this emerging workflow involving human-machine interaction, and how translation concepts developed in translation studies can inform the training of GPT models for translation tasks.

chatgpt, information, translation, (14 more...)

arXiv.org Artificial Intelligence

2403.00127

Country:

Europe > United Kingdom (0.14)
Asia > China (0.04)
Oceania > Australia > Queensland (0.04)
(4 more...)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Charles Translator: A Machine Translation System between Ukrainian and Czech

Popel, Martin, Poláková, Lucie, Novák, Michal, Helcl, Jindřich, Libovický, Jindřich, Straňák, Pavel, Krabač, Tomáš, Hlaváčová, Jaroslava, Anisimova, Mariia, Chlaňová, Tereza

arXiv.org Artificial IntelligenceApr-10-2024

We present Charles Translator, a machine translation system between Ukrainian and Czech, developed as part of a society-wide effort to mitigate the impact of the Russian-Ukrainian war on individuals and society. The system was developed in the spring of 2022 with the help of many language data providers in order to quickly meet the demand for such a service, which was not available at the time in the required quality. The translator was later implemented as an online web interface and as an Android app with speech input, both featuring Cyrillic-Latin script transliteration. The system translates directly, compared to other available systems that use English as a pivot, and thus take advantage of the typological similarity of the two languages. It uses the block back-translation method, which allows for efficient use of monolingual training data. The paper describes the development process, including data collection and implementation, evaluation, mentions several use cases, and outlines possibilities for the further development of the system for educational purposes.

computational linguistic, proceedings, translation, (11 more...)

arXiv.org Artificial Intelligence

2404.06964

Country:

Europe > Ukraine (0.15)
Europe > United Kingdom (0.14)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.05)
(14 more...)

Genre: Research Report (0.40)

Industry:

Government (0.47)
Law (0.46)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback